home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Danny Amor's Online Library
/
Danny Amor's Online Library - Volume 1.iso
/
bbs
/
society
/
society.lha
/
PUB
/
isoc_news
/
1-3
/
n-1-3-040.31.1a
< prev
next >
Wrap
Text File
|
1995-07-21
|
5KB
|
85 lines
N-1-3-040.31.1, Attribute Distribution and Search for Internet
Resource Discovery, by Michael F. Schwartz*,
<schwartz@latour.cs.colorado.edu>
In the previous issue of Internet Society News, I pointed out that
resource discovery involves two basic problems: characterizing the
resources of interest using name/attribute descriptions, and
distributing this information so it can be searched flexibly and
efficiently. I also discussed a number of approaches to the
characterization problem. The current article considers attribute
distribution and search.
The most straightforward solution to the distribution/search problem
is to centralize resource information. This approach is taken by
archie, which stores anonymous FTP directory listings on a central
server. WAIS uses a centralized server to maintain a directory of
WAIS servers. To date, centralized information has worked quite well
in archie and WAIS. Archie maintains information about nearly 1,000
Internet archive sites, and handles thousands of queries per day.
There are hundreds of WAIS servers registered in the top-level
directory, and new servers are added often.
The problem with a centralized solution, of course, is that the
central server can become a performance bottleneck and a critical
point of failure, particularly as the scale of the system increases.
The difficulty in sustaining reasonable response times in the face of
tremendous popular demand for archie has moved the community to create
replica servers. Doing so distributes the load, yet creates auxiliary
problems of distributing the data and maintaining consistency between
replicas. A future version of archie will address these problems
using "lazy" update semantics to distribute data among replicas.
To reduce the scalability and consistency problems of a fully
replicated directory, one can chose a solution where only parts of the
resource data are maintained on any particular server. A common
approach is to impose some organizational properties on the data, and
distribute data according to these properties. For example, the X.500
directory service standard divides information hierarchically. The
tree is divided by country at the top level, and by administrative
organization (company, university, etc.) at the next level down.
Since the information in a hierarchy can be divided into arbitrarily
many pieces, hierarchical directories scale well. Yet, it is only
efficient to search hierarchical information according to the one way
it is organized. For example, in X.500 it is efficient to find
information about a person from a known country and organization, but
it would be infeasible to find people according to their technical
interests or other criteria that exist in the individual resource
records, but that are not represented in the tree structure.
One can mimic the effect of representing multiple search criteria in a
hierarchy by maintaining separate structures with symbolic links to
the "main" data, but searches still require expensive distributed
operations. If, on the other hand, one does not support search
operations, symbolic linking can provide an acceptable mechanism. For
example, the Prospero file system uses symbolic links to provide views
of information in anonymous FTP and other Internet file systems.
Users can browse the information, but search operations are not
supported.
There are other ways to partially replicate resource attribute data
beyond hierarchical distribution. One approach is to distribute
information randomly among a set of servers, and cache the most
popular information at each server. This approach requires that one
sacrifice the ability to perform exhaustive searches. I experimented
with one such protocol, optimized for locating a subset of the
available copies of popular resources.
Another approach is to construct records describing particular
collections of information available in a structured information space
(such as a hierarchical file system) and register these records into
auxiliary indices that can be searched with "flat" search operations.
For example, one could register particularly important/popular
directories in a large file system into indices focused on particular
technical topics, so that users can search for information about these
topics without regard to the organization in the main file system.
This approach is used by the perspective discovery paradigm discussed
in the previous issue of this newsletter.
*Assistant Professor, Department of Computer Science, University of
Colorado - Boulder